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Abstract 

Superposition of sigmoid function over a finite time interval is shown to be equivalent 
to the linear combination of the solutions of a linearly parameterized system of logistic 
differential equations. Due to the linearity with respect to the parameters of the system, it 
is possible to design an effective procedure for parameter adjustment. Stability properties of 
this procedure are analyzed. 
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1 Introduction 

Static base functions are used in a variety of universal function-approximation schemes. Their 
general form runs as follows: Let a given continuous function g{t) be defined over a compact time 
interval [0,T]. There will be a function y{t), represented as 

n 

yit) = J2cifiait + bi), (1) 

i=l 

in which /(■) : i? — > i? is a continuous function and for any given e > 0, there are values of n, ai,bi, 
and Cj, such that for all t G [0,T], 

\g{t)-y{t)\<e. 

Among the functions /(■) for which approximation of g{t) can be proven, the Gaussian and the 
sigmoid are the most well-known ones. Approximation by sigmoid is often favored for, amongst 
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others, its very good rate of convergence with respect to the number n of additive terms in equation 
dl} jlj. Recent results jllj have shown that 

{w.)-.Mf* = o(l). 

Another advantage is that convergence is also possible in Sobolev space, implying the existence of 
an optimal approximator for derivatives of function g{t) JHl; 

In spite of significant progress in the fields of nonlinear optimization and neural networks (a 
comprehensive review of a neural learning algorithms is given in ^3] ) an estimation of the unknown 
values of parameters a^, 6j, q in is still a difficult problem. Simple local optimization strategies, 
involving gradient descent, fail to converge because of nonconvexity of the function with respect 
to the parameters; global search algorithms JH]; prohibitively expensive computationally 

|39j , and second-order search algorithms rely on assumptions relating to the error surface that are 
not always met, for instance uniqueness of the extremum ^T] . 

In order to address the parameter adjustment problem, simplifying assumptions have been 
made [7]. This approach, for instance, requires that the values of each additive term /(ajt + fej) in 

over [0, T] be known. Under this assumption convergence to a global minimum could be proven. 
The method was shown to have a very fast speed of convergence. However, the requirement that 
the value of each term be known imposes severe restrictions on the applicability of this method. 
Following a different strategy, in recent years several new methods have been proposed which are 
capable of avoiding local minima by modifying the learning criterion (see, for instance [22] )• Yet, 
these methods cannot guarantee that the estimates of the unknown values of the parameters Oj, 
bi, Ci converge to their true values (up to permutations). In our view the underlying problem 
with these conventional methods is that, whereas they use error minimization for approximating 
a solution, they lack an explicit model of error dynamics. We will propose a novel approach to 
estimate the values of the parameters in utilizing elements of classical control theory. 

In this approach the values of function g{t) are interpreted as reference signals, the outputs of 
a dynamical system called reference system. The reference signal is used in the explicit definition 
of an error function as, for instance, the difference with a tracking signal. This signal, in turn, 
is considered the output y{6,t) : Qg x R ^ R, 9 E Qg, Qg x R R, 9 E Qg of a. dynamical 
system called tracking system with parameter vector 9 = {a^ : : c^) a, b, c G i?" to be 
determined. Thus the problem of function approximation is transformed into one of finding a 
suitable parameterization for a given tracking system. 

A similar strategy was used in jSH], PP for different purposes. In these studies the resulting 
equations remained nonlinear in their parameters. The presently proposed transformation, how- 
ever, will enable us to represent the problem in terms of a nonlinear system that is linear in its 
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parameters. The linearity allows us to apply conventional methods of adaptive control theory for 
stabilizing the error dynamics and thus facilitate finding the optimal solution. For this purpose, 
the learning problem is formulated as one of adaptive tracking (or equivalently, synchronization 
between reference and tracking system). To this problem we can apply the method of Lyapunov 
functions, extending parameter space Qe to {a,P, C,x(t)|a,/5, C G -R", x(t) : R -R"}, and use 
a simple rule for parameter adjustment in the enhanced system dynamics. This provides us with 
a method potentially more powerful than, for instance, gradient descent, which operates entirely 
in the original parameter space by relying on the contraction theorem. 

It should be mentioned, however, that the problem of parameter value identification has not 
completely been solved even for our case of linearly parameterized, nonlinear systems. The so- 
lutions available in the literature are formulated either for linear systems [201; IHSj or for 
some special classes of nonlinear plants, assuming full state measurement ^U] or the possibility to 
transform the system into an output injection form [2^1; |2Z]- We do not wish to impose any such 
restrictions. Instead we exploit the possibility to extend both the reference and tracking signals 
to be repeated periodically starting from the same initial conditions. By doing so we significantly 
simplify the problem of searching for the optimal values of unknown parameters. 

A strategy similar to the one proposed is often used in iterative learning control 0; j^fjj : 
|33j mostly for determining a feed-forward control term which is defined as a function of time. 
The time-variability of the solution severely reduces the significance of these methods for our 
problem. Nevertheless, there are several approaches that can be applied to search for unknown 
parameters within an iterative learning control framework [21]; jlE]; [HEI- These approaches, 
however, according to our knowledge, are either designed for linear dynamical systems or when 
dealing with nonlinear systems cannot guarantee to stop at the non-local solution. This motivates 
us not only to show the possibility to transform the entire problem of static nonlinear optimization 
into dynamics one but also to provide an algorithm to estimate the unknown parameters of the 
resulting linearly parameterized system of nonlinear differential equations. 

The first step in our approach will be the selection of a "base function" for the reference and 
tracking systems, suitable for representing a broad class of functions. We have chosen the logistic 
differential equation [27j. We will start off by providing an existence proof for approximation in 
this system. The next step will be the specification of an algorithm for parameter adjustment 
that effectively finds the optimal solution in an interesting domain of functions. We consider this 
problem for systems with unperturbed conditions as well as with time-varying parameters. The 
former constitutes a method for representing scalar functions in one variable, for instance time; 
the latter provides a method for representing functions with multiple inputs. Finally, the viability 
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of the approach is demonstrated in examples comparing it to gradient descent. 

The paper is organized as follows. In Section 2 we formulate the problem and introduce the 
class of systems to be analyzed. In Section 3 we investigate the dynamic abilities of the system and 
prove the approximation properties of the system. In Section 4 we introduce the schemes to adjust 
the unknown parameters of the system. In Section 5 we discuss multi-dimensional approximation 
problems and show the possibility to utilize the same technique for approximation of a system 
of nonlinear differential equations with arbitrary smooth right-hand sides. Section 6 contains 
simulation results for illustrative examples. Section 7 concludes the paper. 

2 Problem Formulation 

Although the sigmoidal function approximation scheme has several attractive features, the most 
important obstacle on the way to its implementation remains the absence of an algorithm that 
guarantees convergence to an optimal solution. We suggest a strategy to turn the problem of 
searching for the parameter values of the static nonlinear parameterized map /(a, b, c, t), a, b, c G 
i?"" into one of searching for linear parameter values of a system of nonlinear differential equations: 

± = J2 6,(x)a. + E 6,(x) A, 2/(x) = Cx, (2) 

1=1 1=1 

where x e i?", a = (ai, . . .,anf,P = {Pi, ■ ■ ■ , Pnf e i?", : ^" ^ 6,i ■ ^ BJ" are 
continuous functions, C G i?"^. Therefore, the first problem to be addressed is the existence of 
such a transformation. The proposed solution uses differential logistic equations to realize system 
(j2)). This means we will approach function g{t) with a weighted sum y(x(t)), for which we then 
have to deal with the issue of identifying the parameter values of (0). To this purpose, in control- 
theoretic terms, system Q is considered the reference system, whereas the tracking system will 
have the following description: 

n n 

x = I]6,i(x)ai + X1^2,i(x)A + r/(?/(x),?/(x),t), ?/(x) = Cx, (3) 

i=l 1=1 

where x G BJ^, a = (di, . . . , /3 = . . . , G -R", C G R"^. Note the similarity in struc- 
ture between tracking and reference system, except for an error function r] : E? ^ K^, added to 
the tracking system. In what follows symbols x(t), x(t) denote the solutions of differential equa- 
tions Q, © with parameters a, (3 {a and (3) and starting from initial conditions xq. Sometimes 
in order to stress this dependence exphcitly we will write x(a,/5, xo,t) or x(d, /3, xq, t). 

^We would like to note that dimensions of the vectors a and (3 are not necessarily equal to n. Although we 
do not discuss any other parameterization, a variety of alternative descriptions with different parameterizations is 
possible. 
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As both the reference and tracking systems are described in the same manner, it is natural 
to consider the combined system, which couples reference to tracking system via output |/(x(t)) 
through the error function ?7(?/(x), y(x)): 

n n 
i = + Z1^2,i(x)A, ?/(x) = CX, 

1=1 i=l 

n n 

^ = E^i.i(^)"i + E^2,i(x)A + r?(y(x),y(x),t), ?/(x) = Cx. (4) 

1=1 1=1 

It is possible then to estimate the unknown parameters a, (3, C of the reference system. We start 
out by assuming that the only uncertainties are in the vectors a and (3, while vector C is supposed 
to be known. We will propose an algorithm for parameter adjustment that is capable of finding 
the solution. Our learning algorithm will belong to the following class: 

a = ^(?/(x),|/(x),x); 

'p = S(y(x),y(x),x), (5) 

where operators A{-) and B{-) are to be determined on the basis of the speed-gradient algorithm 
jl2j . If this strategy works, an extension would be to consider cases where the reference system 
does not represent function g{t) completely (i.e. systems with unmodeled dynamics). 

Thus, the questions to be addressed are: is it possible (at least in theory) to transform a 
problem of nonlinear static optimization into a problem of searching for linearly parameterized 
nonlinear differential equations? If so, then how to estimate the parameters of this nonlinear 
dynamical system in order to obtain qualitative approximation? The next sections will provide 
us with the answers. 

3 Approximation with Logistic Differential Equations 

Let the following system be given: 



Xi = a;ia;i(l — PiXi); 
X2 = a2X2{^ - 132X2); 

Xn OlnXni^^ PnXn)i 

y{x) = C^x = ^QXi, Xi{0) = Ai, (6) 

i 



where x = (xi, . . . ,Xn)'^ G -R" is a state vector, ai G R, are parameters of system ©, y is an 
output function, C = (ci, . . . , c^)"^ G i?" is a vector of parameters associated with output y, 
Xi{0) G R are initial conditions. 
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We begin our investigation by asking the question: what dynamics can the autonomous system 
(jHl) produce as a function of tl The answer to this question is formulated in the following theorem: 

Theorem 1 Let continuously differentiable function g{t) : R —>■ R be given. Then for any e > 0, 
< T < oo and t G [0,T] there are such numbers n, ai, Pi, Ci and initial conditions Xj(0) = Aj 
that the following inequality holds: 

\y{^{t))-g{t)\<e. 

Theorem 2]proof is quite straightforward and is based on the known fact that solution of the logistic 
differential equation of the first order can be given by a sigmoidal function [SB]- Nevertheless, in 
order to make the paper self-contained we present the proof in the Appendix. Proofs of the 
subsequent theorems and lemmas are given in the Appendix as well. 

Remark 1 It follows from Theorem ^ proof that it is possible to transform the problem of non- 
linear function approximation by static sigmoidal functions into a problem of choosing initial 
conditions and parameters at and q of dynamical system (jHl), where parameters enter (jHl) lin- 
early. One can observe, in addition, that under an appropriate linear transformation Xi —>■ Xi/ci 
(cj 7^ 0) we can get rid of uncertainties in C (see Remark 0] after Lemma |21 in Appendix 1) and 
replace system (jH)) by 

Xi OtiXi -\- PiXi , 

2/W = Xl^i' Xi{0)=Ai/ci, (7) 

i 

where and Pi are to be determined. We formulate this 

Corollary 1 Let system ^ and continuous differentiable function g{t) : R ^ R be given. Then 
for any e>0,0<T<oo and t G [0, T] there are such numbers n, ai, Pi and initial conditions 
Xj(0) that the following inequality holds: 

\y{-^it))-g{t)\<e. 

This result will allow us to turn the problem of determining the nonlinear parameters of a static 
function into a problem of determining the linear parameters Oj, Pi of system ((7j). The restrictions 
are that the values a;j(0) will have to be known. 

Remark 2 Theorem ^ proves that there is a one-to-one transformation of a function approxima- 
tion problem in terms of static sigmoidal functions to one in terms of differential logistic equations. 
The latter, therefore, shares all the advantages of the former, including the very good convergence 
rate ^T] and its application in Sobolev space [T7] . 
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Theorem [T] merely states the existence of parameters a, and Cj of system © (or and Pi of system 
(jZj)) that ensure arbitrarily small errors between the system output and the reference function g{t). 
It does not answer the question how to derive the parameters. However, the linearity of the system 
in its parameters simplifies our task. We will show in Section 5 that in the multidimensional case 
the resulting system will be linearly parameterized as well. In the next section we will turn to the 
issue of how to find the values of the parameters that yield minimum errors. 

4 Parameter Adjustment Algorithm 

The question is whether it is possible to estimate the unknown parameter values a^, jSi for which 
g{t) — y(x(t)) = for t G [0,T], utilizing the linear parameterization of system ((7j). For designing 
the estimation algorithm the following strategy was used: first, it is assumed that the only un- 
certainties are in the linear parameters a,, jSi, initial conditions x(0) are assumed to be known. 
We formulate this in Assumption ^ First, our main algorithm is presented. Second, after this 
algorithm is given we extend it to the cases where the reference system does not represent the 
function g{t) completely, i.e., with unmodeled dynamics. It will be possible to invoke Theorem 
Hand show that any function that merely is approached by reference system dynamics can still 
effectively be modelled by the tracking system, albeit within a margin of tolerance. 

In order to proceed with the analysis we would like to introduce the following assumption: 

Assumption 1 Let continuous function g{t), number of equations n and initial conditions Xi{0) 
be given. There exist such parameter values ai and (3i that for any t G [0, T] the following equality 
holds for system ^ solutions: 

n 

g{t)-J2ciX,{t) = 0. 

i=l 

Assumption [T] states that the reference signal g(t) can be represented by the output of system (|7j): 

n 

g{t) = CiXi{ai, Pi, Xi{0), t). 

i=l 

The coefficients Cj can be equal to the unity. 

In order to make the presentation more clear and compact, we would like to introduce a 
notational assumption regarding the tracking and reference systems. Let us redefine the system 
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equations, denoting the right-hand side of ((Tj) by ^i,i(x)aj + J2i=i^2,i{'^)Pi, where 



(n — I] 



V 










/ 



6,(x) 



[n — I 



V 




Then both reference and tracking system can be rewritten in the compact form 
Section 2: 



introduced in 



n 



n 



X = + Z1^2,i(x)/5, y(x) = Cx, 

1=1 1=1 

n n 

X = I]6,i(x)ai + ^6,i(x)A + r7(?/(x),2/(x),t), y(x) = Cx, 

i=l 1=1 

where C = C = (1, . . . , 1)"^. Hence, to complete the definitions of reference and tracking systems 
one needs to determine T]{y(x.),y{^),t). One possible way to do this is to define the function 
r]{y{'x),y{yi),t) as follows: 

r/(i/(x),|/(x),t) = ir(t)(y(x) -y(x)), 

where K{t) = {ki(t), . . . , kn(t))^ and ki{t) are to be specified later. The reason for such a structure 
is that we need the tracking system "to copy" the reference dynamics along a manifold ?/(x) — 
7/(x) = 0. Thus, an aggregated system which contains both the reference system for signal g{t) 
and tracking system ((7j) can be written in the following form: 



X = IZ^i,i(x)ai + ^6,i(x)A, l/(x) = Cx, 

i=l i=l 
n n 

i = Eei,^(x)«. + ^e2,(x)A + ^(t)(y(x) - 2/(x)), y(x) = Cx, 



(8) 



i=l 



i=l 



As has been mentioned in the beginning of the section, we would like to obtain such estimates of 
the parameters a^. Pi, that g{t)—y{'K{t)) = over time-interval [0, T]. It was proposed in Section 2 
to utilize conventional speed-gradient like techniques to design the learning or adaptation rule. For 
these methods, the parameters are supposed to be adjusting on-line, that is in the same time-scale 
as the reference and tracking systems evolve. In general, it may take much more time than T (the 
length of the interval [0,T]) for the estimates dj, /3j to converge to a^, However, the function 
g{t) may not be defined for t > T, and even if it is well defined over [T, oo) then equivalence 
|/(x(a, /5, t, xo)) = ?/(x(d, /3, t, xo)) for t > T does not imply that g(t) = y{'k{a, (3,t,Xo)) for any 
tG [0,T]. 
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In addition, we note that logistic equations © can be very unstable and may have finite 
escape time depending on the vectors a and /5. For the reference system this is not important as 
we assumed that every solution Xj of (0) can be described by a sigmoid function and therefore is 
bounded. For the tracking system, however, stability becomes very crucial. It is very well possible 
that during a and /3 adjustment and due to the term K(t){y{'x) — ?/(x)) in (jHl) the state x of the 
reference system can reach infinity in finite time thus making the whole system unstable. 

Taking these considerations into account, it is necessary to redesign the reference and tracking 
systems in such a way that: 1) ?/(x(a, /9, t, xq)) y(x(a, /3, t, xq)) as t — ^ oo implies that \g(t) — 
|/(x(a, /3, t, Xo))| < e for any £ > and arbitrary t G [0,T]; and 2) the state x of the tracking 
system remains bounded for any t > 0. 

Our proposed solution to problem 1) is to let the reference signal g{t) be repeated periodically 
(see Fig. 1, where the initial signal g{t) is extended periodically along axis t). Periodicity can 
be achieved by introducing special terms (A and a below) into the systems right-hand sides that 
will periodically force the states to move to xq (with period Ti = T + AT2, where AT2 is amount 
of time needed to reach xq). In order to solve problem 2) we have to make sure that state x 
of the tracking system is bounded for any t > 0. This can be achieved if we force the states 
of both systems to move to xq as soon as ||x|| exceeds certain bound D. Roughly speaking, one 
can add time-varying negative feedback to both reference and tracking systems, thus making the 
point Xo globally asymptotically stable for both systems and, in addition, allowing the output 
|/(x(q;, /?, t, Xq)) of the reference system to coincide periodically with the segments of trajectory 
g{t) defined over [0,T]. 

In order to satisfy these requirements we introduce the next 

Assumption 2 There is a positive constant /q > and function \ : B? R 
Uf _ / 0' ^ e [0' - ^)Ti,3T, - AT2) and ||x(t)|| <D 

such that the reference signal is given by the following system: 

Cn n \ 

6,(x)a. + E 6,^ (1 - A(t, D)) - A(t, D)ai^ - x(0)) 

i=l i=l / 

y{^{t)) = ~g{t), 
where a{-) is a signum function: 

( 1, Xi-Xi{0) > 0; 

a(-) = (ai(-), . . . , a„(-))^ : ^^(x - x(0)) = <^ 0, x, - Xi(0) = 0; 

[ -1 Xi- X,;(0) < 0, 

^0 > D/AT2, g(ti), ti G [0, 00) is an extension of g{t), t G [0,T] and Ti = T + AT2. 
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Assumption |21 requires an inclusion of several extra parameters and functions into the gener- 
ating system right-hand side. Additional restrictions are to be introduced just to make sure that 
for each t = jTi, the following holds: 

Xi{t) = Xi{jTi) = Xi{t) = Xi{jTi) = Xi{0), j = {1,2,..., oo}, i e {1, . . . ,n}. 

Taking into account Assumption |21 and the fact that the tracking system is designed to copy 
the structure of the reference system, we can write the combined reference and tracking systems 
as follows: 

Cn n \ 

ei,(x)a. + E ^2,^{^)P^ (1 - Kt, D)) - A(t, Z})/oa(x - x(0)) 
i=l i=l / 

(n n \ 

ei,(x)«. + E ^2,(x)A + K{t){y{yi) - i/(x)) (1 - A(t, D)) - A(t, Z})/oa(x - x(0)) 
i=l i=l I 

y(t) = l/(x(t)) = C^x(t). (9) 

Before we introduce an adjustment rule for the tracking system let us formulate the following 
lemma: 

Lemma 1 Let system ^ be given and C"^ 7^ 0. Consider 

n n 

|C^E(«^(^m(^) -^mW) +/^*fe,(x) -6,(x))) (1 - \{t,D))\ + eYkA 

i=l 1=1 

Then for any given constant 5 > there exist ki = k* & R such that 

n n 

E («^(ei,(x) - 6,(x)) + A(6,(x) - 6,i(x))) (1 - \{t, D))\ + tY k*c^ < (10) 
1=1 1=1 

for any e > 6. 

According to Lemma Q for any positive 6 > the existence of the coefficients k* satisfying 
inequality (fTUI) is guaranteed. This property is very important for the subsequent analysis. In 
fact, it states that the error function e = y{t) — y{t) is attracted to the domain |e| < 5 at d = a, 
^ = 13, \{t, = and hit) = k* as 

Cn n \ 

E («^(ei,(x) - 6,i(x)) + A(6,i(x) - 6,(x))) + e E Kc^ (1 - A(t, D)) 
i=l i=l / 

and 

^(O.Se^) = ee < 0, V|e| > 5. 



10 



Let us introduce the adjustment rules for parameters Oj, 



&i = -7e(t)55(e)C^eM(x)(l-A(t,D)), 



Ssie 



/3, = -7e(t)S5(e)C^6.(x)(l-A(t,D)), (li; 

f 1, |e| > 5 
10, lei < 5 ■ 



where e(t) = y{t) — is the tracking error, 7 > is a positive constant. 

The stabihty properties of system Q with algorithm (fTTj) are formulated in: 

Theorem 2 Let Assumptions^\^hold, vector C 7^ 0, and function K{t) = {ki{t) , . . . , kn{t))^ in 
(0j be given by the following system of differential equations 

k = -^Ss{e)e^c,{l-X{t,D)). (12) 

Then for any positive 7 > a// trajectories of system (0) are bounded, and there exists ti > such 
that for any t > ti the following inequality holds: 

|l/(x)-|/(x)| <6 + 6i,6i>0. 

Remark 3 Theorem |21 guarantees that function e{t)X{t, D) in system converges to the domain 
\e(t)\(t, D)\ < 6, where constant 6 is defined in learning algorithm (fTTj). Formally, \e(t)X(t, D)\ < 6 
does not automatically imply that estimates a, (3) converge to the point a = a, (3 = P in the 
parameter space. Nevertheless, according to formula (see Appendix, proof of Theorem |21), 
one can derive the following estimate of how close we are to the solution 

> ||K(t)-r||^_, -||K(to)-F||^_, +2 / Ss{eMT)X{r,D)Y,k*c,\6,dT. (13) 

Equation (fT!?|l may be taken to reflect the quality of estimation of the unknown parameters a and 
p. In particular, if we choose K(to) = 0, then 

||d(to) - a||?-i + \mo) - /3||?-. - ||d(t) - «||5_, - 0{t) - /3||^^i 
> ||ir(t)-A;*||2_, -||F||2_i+2 / Ss{eMT)X{r,D)J2k*c,\6^dr. 

Therefore, the smaller the norm ||i^(t)||, the greater is the chance that the difference 

iid(to) - aii;-i + \mo) - pf^-i - \m) - «ii?-i - wm - /^ii^-i. (m) 

is nonnegative. On the other hand, given the values of 6, 61, D, C and bounds for a, (3, one 
can explicitly estimate vector k*, satisfying inequality (jlOj) . Hence in this case formula ()13|1 gives 
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explicit bounds for the deviations of the estimates d, j3 with respect to a and /3. Furthermore, for 
known k* it is possible to get rid of time- varying coefficients ki{t) in 0, replacing them by k*. In 
this case difference (fT^ is positive if \e{t)\{t,D)\ exceeds 5 at some time ti. 

In general in order to ensure the positiveness of difference (fT^ for a given parameterization 
of the reference system, it is necessary to consider more carefully the dynamics of the following 
deviation p = x — xatd = Q; and P = j3 over the time intervals where X(t,D) = 0: 

P = (E «.(6,(x) - eM(x)) + A(6,(x) - 6,(x))) + i^(t)C^(x - x) 

Functions ^i^j and ^2,i are differentiable with respect to their arguments. Therefore there exist 
such Si^j(x, x) and S2,i(x, x) that the following equalities hold: 

Si,j(x, x)(x - x) = ^i,i(x) - 6,i(x) 

H2,i(x,x)(x - X) = ^2,i(x) - 6,i(x) 

Then derivative p can be written in the following form 

P = (E aiSi,,(x, x) + AS2,i(x, x) + K{t)C''^ p (15) 

It can be derived from Theorem |21 proof that the existence of a positive function V(y(x), ?/(x)) 
with time derivative V at a = a, P = P satisfying, 

V{y{ic), y(x)) = -Wiyi±) - y(x)) (16) 

where W{-) is a positive definite function, guarantees monotonic increase of the difference ()14|1 . 
Therefore, if one can find vector K{t) such that it asymptotically stabilizes system (fT3j) for the 
given domain of parameters a, (3, and furthermore, inequality ()16p holds, then the positiveness of 
difference ()lfi|l is guaranteed. The problem of determining K{t) however is not very easy to solve, 
especially for nonlinear systems. Even for linear ones, a similar problem known in the literature as 
the Brockett problem^ [H] has positive solutions at present for systems of second and third order 
[211 EH! • Nevertheless, despite the obvious difficulties, we believe that the question of searching for 
the suitable K(t) ensuring inequality (fT^ for system (fT3j) could be an achievable goal for future 
studies. 

^Let the following triplet of matrixes be given A, B, C £ i?"^". Under what conditions does a time-variant 
matrix K{t) exist such that system 

X = ylx + BK{t)Cyi, x e R"" 

is asymptotically stable? 
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It is desirable to note that Theorem |21 requires the vahdity of Assumption ^ Assumption Q 
allowed us to model the function g{t) by a reference system of the same structure as the tracking 
one. This feature has been exploited in the proof of the theorem and played an important role in 
order to guarantee convergence of errors to a neighborhood of the origin. This assumption may be 
too restrictive as it requires strict equivalence between reference and tracking signals for a = a, 
(5 = (5. We are now ready to abandon this assumption by invoking Theorem ^ again. 

If Assumption [T] does not hold this leads to nonzero error e{t) between the output ?/(x) = 
C"^x(t) of the reference system and signal g{t) to be tracked: 

n 

^{t) = ^QXi(t) -g{t). 

i=l 

Let us assume that g(t) is continuously differentiable, then e(t) is different iable as well. We denote 
its first derivative by de{t): 

d 

-(^(x(t)) - g{t)) = ^ c.i, - m = ds{t). (17) 

Due to the compactness of the interval [0,T] we can conclude that derivative de{t) is bounded: 

\de{t)\ < s. 

Let us derive the error e(t) = y(x) — g(t) = y(x) + e{t) — ?/(x) dynamics taking into account 
that C = C and, in addition, that function e{t) can be considered as an unmeasured disturbance 
subtracted from the output y(x(t)) generated by the reference system ^I^i: 

^ = - + A6,.(x) - A6,^(x)^ (1 - X{t, D)) - de{t) 

{K{t){y{iL) - 2/(x) + e{t)){l - A(t, D)) + /o(a(x - xq) - a(x - xo))A(t, D)) (18) 

The only difference between error dynamics according to Assumption ^ and the expression given 
in (|18p is in the term de{t) + C'^K{t)e{t) which represents the unmodeled dynamics of g{t). 

There are several ways to deal with such an uncertainty. One of them is to include a dead- 
zone into the parameter adjustment scheme [221 chose K{t) = const. The algorithms with a 
dead-zone will have the same form as (jllj) : 

&i = -7e(t)55(e)C^eM(x)(l-A(t,D)), 

= -7e(t)S5(e)C^6.(x)(l-A(t,D)), (19) 

c r \ f 1, |e| > 5 
= 0, |e|< 5 • 
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except that the width 6 of the dead- zone is to depend on the bounds for de{t) and C^Ke{t). 
Theoretical analysis of the stability of the whole system with learning rule can be done in 
the same manner as with (lllj). 

It is clear that the tolerance of the resulting learning process will depend on the dead-zone 
width 6, which is exactly the upper bound of de(t) +C^Ke{t). Therefore, in general, applicability 
of the proposed learning rules strongly depends on a smoothness of e(t) (in the sense of the 
maximum absolute value of its first derivative). We may deal with this issue by referring to the 
properties of this approximation scheme in Sobolev space JZI;5H]- It can be shown that for any 
arbitrary small ^2 > there exists a network that can approximate a given reference function g{t) 
such that both derivative de{t) and 6{t) satisfy the following estimation: \de{t) + C'^ Ke{t)\ < 62- 
Hence, learning algorithm (fTIHl will still be applicable even in the presence of nonzero different iable 
error e{t) between the reference signal and outputs of the tracking system at a = a, f3 = f3. What 
value of 6 is admissible will depend on the dimension of the system. 

5 Discussion 

Here we discuss mult i- dimensional extensions with an eye for possible neural network applications 
of our approach. Theorem ^ states that any continuous function of t can be approximated over 
time interval [0,T] by a linear combination of the solutions of system ((Tj). It is desirable to note 
that we can choose function g[t) in such a way that the following equality holds: 

9it) = mt)), (20) 

where g E C^, C,(t) is a. smooth function of t. Let us suppose that system ((Tj) realizes function 
g{i). This means that 

n 
i=l 

where = — (3iXi). Then we consider function g{^) as a function of time t which satisfies 

equation Therefore due to formula we can write: 

n 

~g{m) = j:^Mm)- 

4 = 1 

Moreover 

Hence under the following assumptions: g{t) = git) at t = and g{0) = g{^{0)) we can see that 
linear combination J27=i CiXi(t) of the solutions of system 
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realizes function g{t) and vice- versa. This simple observation suggests how to extend the result 
to the multi-dimensional case. It is possible to consider a reference function g{^i, . . . ,^m) with 
m inputs as a function of time t: g {C,i(t) , . . . , ^m{t)) ■ Then a system which realizes function 
9{^iit)y ■ ■ ■ y^mit)) can be represented in the following form: 

. _ /- . \ 

\j=l J 

yim) = EQx,(t). (21) 

i=l 

If we return to the approximation problem we may observe on account of Theorem^ that system 
1)2111 is able to approximate a given function g{C,i, ■ ■ ■ ,^m) over a given compact domain in such 
a way that for a particular trajectory (^i(t), • • • ,C,m{t)) and any given constant e > there exist 
parameters aij, f3ij, Ci, initial conditions and number n satisfying the following: 

|(7(a(t),...,e™(t))-y(x(t))| <£. 

Curve C,{t) should be designed in such a way that good approximation along the curve ^{t) implies 
good approximation along the whole surface. Intuitively, this depends on the degree to which the 
curve "covers" the space. In other words, the more complex curve (^i(t), . . . ,C,m{t)) is, the better 
the approximation that can be achieved over the given compact interval. 

An important consequence of this description is that a system of coupled logistic differential 
equations ()21|) may realize an approximation of a nonlinear time-invariant system of the following 
type: 

y = x(y), (22) 

where x(') • -R" ~^ -R" is an arbitrary smooth function. Let us explain this. Denote: 



^(x, b, c,t) = Y^ Cif{ait + hi). 

i=l 

Consider system (j2H) for m = 1 and replace ^{t) by ^{t): 

n rt 

7/(x(i)) = EQX,(t)=^(a,/3,xo,C, / e(r)rfr). (23) 

One may substitute function y{t) in ()23p instead of C,{t). This leads immediately to the following 
equations: 



Xi = aiy{t)xi{l - l3iXij, 

y{t) = ^(a,/3,xo,C, f y{r)dT). (24) 

Jo 
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Denoting z{t) = jQy{T)dT and taking into account that y = J27=i ^i^i rewrite system 

in the following manner: 



n 



z = ^CiXj, (25) 

where the new output function z{t) satisfies the following differential equation: 

z = J^{a,P,yio,C,z). 

^(a, /?, xo, C, 2;) may realize function xi^) with given tolerance subject to the choice of the pa- 
rameters xq, C and the number of equations in ()25|). In the same fashion one can derive the 
results for m > 1 and obtain the corresponding systems for differential equations: 

Zi = J^i{0i, P, Xq, C, Zi, Z2, ■ ■ ■ , Zi, . . . , Zn), 

thus approximating (j22I)- 

There are two important observations to be made regarding system ()25p . First, one may 
notice that system ()25p is a specific instance of the Cohen-Grossberg model jH]- Therefore, it is 
possible to claim that Cohen-Grossberg models of several differential equations, each of which has 
relatively simple description (for instance, coupled logistic differential equations), in principle, are 
capable of approximating every nonlinear dynamical system with smooth right-hand sides (subject 
to appropriate choice of the number of differential equations, initial conditions and parameters). 
Furthermore, the learning algorithms, introduced in the paper can be applied to these models as 
well, and their stability may be proven in the same fashion. Second, it is desirable to notice that 
this approach allows us to introduce an alternative learning technique to that of backpropagation 
through time jlOj, albeit for continuous-time systems. A detailed discussion of these topics is 
beyond the scope of the present paper. 

The algorithms introduced in the paper guarantee that under certain circumstances the esti- 
mates d, (3 approach to a domain around a, (3. Still, they cannot guarantee that d — > a and 
j3 ^ (3. An interesting problem, therefore, is whether it is possible to design a tracking system 
that guarantees convergence of d, /3 to a and /? respectively. This problem in our opinion is closely 
related to the problem of adaptive observer design j2Hl for the reference system in Q: 



X = E 6,(x)ai + E 6,* (1 - A(t, D)) - A(t, D)a(x - x(0)) 

\i=l i=l / 

l/(x(t)) = C^x. (26) 
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A prerequisite for applying the corresponding method is that these systems are transformed into 
the canonical observable form [5]. For nonhnear systems that are hnear in parameters necessary 
and sufficient conditions for this have been given [23] . These conditions do not hold, however, for 
the parameterizations of type (j2Sl)- Therefore, the question remains open, whether is it possible 
to find such linearly parameterized nonlinear system and corresponding output function ?/(x), 
such that 1) its parameters can be transformed by one-to-one mapping into those of sigmoid 
superposition, and 2) the parameterization of this system obeys assumptions introduced in work 
(see Theorem 3.1). If one finds such a suitable parameterization, then the problem of finding 
the "true" parameters (subject to permutations) can be solved effectively. 

6 Examples 

In this section we illustrate the theoretical results with examples. First we consider application of 
Theorem 121 to the search for unknown parameter values of a single sigmoid function and then show 
the effectiveness of our method in comparison with the conventional schemes for two-dimensional 
optimization problem. In addition we illustrate our method with the results of computer simula- 
tions performed for a system consisting of 10 sigmoidal functions. 

6.1 Example 1 

Let us illustrate the possibility to search for the parameters and q simultaneously. As has been 
suggested in Section 3, instead of the parameters ctj and q we will deal with and Pi = ai/ci. 
Reference function g{t) has been chosen to satisfy: 



where a = 2/3, (3 = 1/3, Iq = 1, x(0) = 0.1, K{t) = 0.2, e = x — x. Function X(t) was chosen to 
be a periodic function with period T = 10 sec, pulse width is 1 sec and unit amplitude (one may 
easily check that this parameter setting ensures exact matching between function g{t) and x{t) 
over time interval [0,9]). 



g{t,a,c) = J 




where c = 2, a = 2/3. We design the reference and tracking systems as follows: 

x= {ax- f3x^){l - A(t)) - X{t){loa{x - x(0))) 

£ = (df - _ A(t)) - \{t){loa{x - a;(0))) - K{t)e, 



(27) 



Adaptation rules to adjust the parameters a and 13 may be written as follows: 



-0.2e(t)x(t)(l - \{t))- 
Q.2e{t)x^{t){l - \{t)). 



(28) 
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In order to make the example more illustrative we would like to compare the performance of 
algorithm ()28j) with a conventional pattern-by pattern gradient scheme: 

a - 
c - 

and batch rule: 

a 
c 

where 

J{a,c)= / {g{T,a,c) - g{T,a*,c*)YdT 
Jo 

Results of such a comparison are shown if Figures 2-5. In Figure 2 there are two trajectories 
of the parameters a{t) and c(t) in two-dimensional space. The first curve is obtained from the 
trajectories of a(t) = a(t), c{t) = a(t)/(3(t) and results from algorithm with initial conditions 
«(0) = —3, (3(0) = 1. Curve 2 is a solution of ()29j) starting from initial conditions d(0) = —3, 
c(0) = —3. It can be seen that algorithm ()28p reaches the global minimum. Conventional gradient 
descent fails to do so. It appears unstable and goes through a neighborhood of the global minimum 
along a valley. This process is shown in Fig. 2. In addition, algorithm (|^H)l is much faster than 
(j^ (see Fig. 3 for details). 

Figure 4 reflects another interesting feature of algorithm ()28|) . Whereas the conventional 
gradient algorithm starting from d(0) = 3, c(0) = —3 goes towards the goal along the isolines 
(Curve 2), algorithm (j^Hj) does not stick to isolines. Instead, it goes through infinity in the 
coordinates a, c. This is not because of any singularities with respect to the coordinates d, f3 but 
is due simply to the transformation c = a/P, when /3 goes through zero. 

Figure 5 contains the trajectories of the solutions obtained with algorithm ()30p . Curve 1 
shows the trajectory corresponding to initial conditions d(0) = —3, c(0) = —3, Curve 2 is related 
to initial conditions d{0) = 3, c(0) = —3. It is easy to see that this algorithm gets stuck in local 
minima. 

The performance of algorithm (j^Hj) is not surprising because it uses information about the 
system properties in a more intelligent way than gradient descent methods do. In addition some 
coordinate transformation has been used and the process of searching for the minimum is organized 
in a different coordinate system. All the results relating to stability, however, remain true for the 
functions which may be represented by a superposition of sigmoid function only. 



-0.2e(t) 
-0.2e(t) 



dgjt, a, c) 

da 
dgjt, a, c) 

dc 



-0.2 



-0.2 



dJ{a, c) 

da 
dJ{a, c) 

dc 



(29) 



(30) 
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6.2 Example 2 

In addition to the simple example of the previous section which merely illustrates the design 
procedure for the parameters adjustment rules proposed in the paper, we would like to present 
more supporting results of computer simulation of our algorithms for a larger number of functions 
in superposition. We consider the sum of 10 sigmoid functions 

q(t, a,C) = y^ ^ , , , 

where parameters bi and q are assumed to be known and t e [0,T]. According to the results 
presented, this sum is equivalent to the solutions of the corresponding system of logistic equations 
(jni) with known jSi, Ci and initial conditions. The only uncertainties are in parameters a^. First, 
we extend the reference signal g{t) to be periodically repeated over [0, oo): 

f git), t<T 

m = { 0, T < t < T + AT2, 

[ g{t-T-AT2), t>T + AT2 

Then we design the tracking system 

ki = adii^ - Xi){l - X{t, D)) + ki{t)e{l - X{t, D)) - A(t, D)ka{xi - x,(0)) (31) 
and adaptation algorithm 

at = --fSs{e)eXi{l - Xi){l - X(t, D)) 

kit) = —iS&{e)^c,{l - A(t, D)) (32) 

where D = 10 (taking into account that < 1 we have to choose D > 1), X{t, D) is a T + AT2 
periodic function with the pulse width AT2, 6 = 0.0001, 7 = 0.001, T = 2, AT2 = 1, Iq = 10. 
Initial conditions Xi{0) and parameters Cj were randomly chosen and their exact values are given 
below: 



xi(0) 


= 0.1 


Cl 


= 3 


X2{0) 


= 0.2 


C2 


= 5 


XsiO) 


= 0.3 


C3 


= -3 


xa{0) 


= 0.2 


C4 


= 0.5 


xm 


= 0.5 


C5 


= -1 


xg{0) 


= 0.1 ' 




= 2 


X7{0) 


= 0.7 


C7 


= -0 


XsiO) 


= 0.2 




= 5.5 


X9{0) 


= 0.6 




= -3 


a:io(0) 


= 0.4 


ClO 


= 2 



One could choose the functions ki{t) to be equal to some constants over [0, 00). This however 
would require knowledge of the exact value for a width of the dead-zone (parameter 6) in the 
adjustment algorithm for this particular set of ki{t). 
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We simulated tracking system (|31|) with algorithm (|32j) for 400 trials, choosing the initial con- 
ditions for the estimates a{0) randomly in the hypercube [0, 12]^° for every trial, initial conditions 
for ki{t) were set to zero. Each trial consisted of 10000 periods (epoch) and each epoch lasted 
for T + AT2 = 3 seconds. In order to check the sensitivity of the approach to the numerical 
integration we used a simple Euler's method of the first order with integration step 6t = 0.0001 
seconds to approximate the solutions of and ki{t). In order to judge effectiveness of 

our algorithm we introduced the following criteria: 



d{t) 



\ 



10 



^2 



i=l 



^ ^ ^ T + AT2 

The histograms of distributions of distances d{t) and performance indices R{t) computed in the end 
of each trial are shown in Fig. 6 and 7, respectively (we made sure that (i(0) — (i((T+ AT2)10000) > 
for every trial). It can be clearly seen from the figures that after application of the algorithm 
dSl the distributions of the distances c/((r + AT2)10000) and i?((r + Ar2) 10000) are significantly 
shifted to the left towards zero. 



7 Conclusion 

In this work the problem of estimating the parameters for a function represented by sigmoid 
superposition has been analyzed. The key to our proposal is the transformation of this static 
nonlinearity into a linear combination of solutions of a system of differential equations. These 
equations are linear in parameters but nonlinear with respect to the state variables. We considered 
the dynamics of an unperturbed system of differential logistic equations. It was found that a linear 
combination of the system solutions may realize any continuous function over interval [0, T] with 
given tolerance e > 0. This tolerance can be made arbitrary small as a function of the number 
of equations, with corresponding parameters and initial conditions. In addition, we showed that 
a system of logistic equations with time-varying parameters can realize a function with multiple 
inputs. The results enabled us to consider a system with coupled equations via output function 
7/(x) as a generator of almost any dynamical system as long as it is smooth in its state and output 
variables. 

The linearity of the resulting system with respect to its unknown parameters allowed us to 
apply conventional methods and ideas of adaptive control in order to estimate their values for a 
given reference function. Extension of both the reference and tracking signals to be repeatable 
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(periodic) over [0, oo) interval played a crucial role in our analysis. This feature makes it possible 
to use known matching conditions (or certainty equivalence) to design the adaptation algorithms. 
Stability analysis has been performed for the learning schemes introduced. 

The current algorithm is able to produce the estimates that approach the true values of un- 
known system parameters within a bounded domain. However, convergence to these true values 
cannot be guaranteed. It should be mentioned, however, that the problem of finding a flawless 
algorithm is all but solved by our proposal. The most difficult hurdles to knock down were shown 
to be the boundedness of sohitions and the problem of determining the maximum ampUtude of 
unmodclcd dynamics (when the reference signal is not exactly a superposition of sigmoid func- 
tion). Though we offered possible solution to these issues in the present paper, more effective ones 
may still exist. Finding these may be a topic for future research. 
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8 Appendix 

Theorem^ proof. We prove the theorem in 3 steps. First, we transform the original system © 
into a system with its right-hand side depending on one set of parameters (a = (ai, . . . , a^)^ only 
instead of the two sets a and (3). Second, for each Xi, i G {1, . . . , n} we show that the solution 
Xi{t) belongs to the interval [0, 1] for any Xj(0) G (0, 1); x{t) is a monotonic and sigmoidal function 
with parameters depending on a and initial conditions. Therefore, to conclude the proof it is 
sufficient to apply a widely-known result^ from approximation theory |9]:|13j. 
Let us start with 

Lemma 2 Let system (0) he given and Pi ^ 0. Then there is a linear transformation Xi = (3iXi 
of system (0j coordinates that the following holds: 

Xi = a;ia;i(l — a;i); 
X2 = 0:2X2(1 -X2); 

^Let f be any continuous sigmoidal function. Then finite sums of the form: X^^Li Ci/(aiX + bi), ai e i?", 
X S i?", bi £ R are dense in C(In). 
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y(x) = C^^ = Y,^x,, x,(0)=AA„ (33) 
Lemma\^ proof . The proof is a routine procedure. Let us calculate Xi = jSiXi: 

Xi PiX{ (y{PiXi(^l. PiXi^ CYiXii^l. ^2)* 

The rest of the lemma proof is quite obvious and we skipped it. The lemma is proven. 

Remark 4 It is desirable to note that the linear transformation Xi = jSiXi is one-to-one, and for 
any system ()33|) we can derive its transformed version in the form of system ((7j) by the inverse 
transformation Xj = l/f3iXi. Therefore in the rest of the proof we will deal with system (jHHj) . In 
addition, it is always possible to make a transformation such that the resulting at will be positive. 
Furthermore, given system ©, one can choose such linear transformation Xj = that the 

transformed system obeys 

xi = aiXi{l — PiCiXi); 
X2 = a2X2{l - I32C2X2)] 



y(x) = C'^x = ^^a;, = ^x„ x,(0) = A,/C„ (34) 

i i 

thus eliminating the parametric uncertainties in output function y(x) and replacing them by the 
parametric uncertainties of linearly parameterized system (j34j) with known output function t/(x). 

Let us consider the properties of each i-th equation of system (jSSl)- We formulate the next 
lemma: 

Lemma 3 Let the following differential equation he given: 

x = kx{l-x), k^O, (35) 

and x{t) is a solution of system / 1 5*5)) for initial condition x{0) = Xq, Xq G (0, 1). Then the next 
statements hold for equation iy^} : 

1 ) x{t) is a monotonia function with respect to t > 0; 

2) x(t) ^1 at t ^ 00 for k > and xq G (0, 1); x(t) ^0 at t ^ 00 for k < and Xq G (0, 1) 

3) x(t) is unique for any t > and initial condition xq G (0, 1). 
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Lemma HI proof. Statement 1) of the lemma proof is obvious and therefore has been skipped 
here (see, for example j^). Let us prove statement 2) of the lemma. We consider the following 
function: 

V{x) = 0.5{x - if. (36) 

It is clear that function V{x) is well-defined and positive definite for any x > 0. Moreover, 
V{x) ^ oo at X ^ oo and V^(x) = at x = 1. These facts allow us to consider function V as 
Lyapunov's candidate for system ()35|). Let us calculate V: 

V = {x - l)x = -kx{l - xf < 0. 

We observe that V > and V = —kx{l — xf < for x > 0, x 7^ 1. For any x G (0,1), 
\^(x(0)) — V{x{t)) > and therefore x{t) > x(0). Hence the next inequality holds: 

V" = (x - l)x < -fcx(0)(l - xf. 

This can be written as follows: 

V < -kx{0)2V{x). 

Hence ^ asymptotically, and x(t) — 1 at t ^ 00 for any x G (0, 1). To prove the second 
part of statement 2, where < 0, it is sufficient to consider the following Lyapunov's candidate 
y(x) = 0.5x^. Its derivative satisfies the following equation: V{x) = kx^{l — x) and is obviously 
negative definite over x G [0, 1). 

Uniqueness of x{t) follows directly from the continuity of equation (jH^j) right part [34J. Lemma 
is proven. 

Regarding lemma El we observe that system solutions for > are completely defined 
by the choice of initial conditions Xj(0). This means that if Xi{t + r) and Xj(t) are solutions of 
system and Xj(t + r) = Xi(t) for any t > 0, then 

Xi{t + t) = Xiit) ^ Xiir) = Xi(0). 

In other words, for each solution Xj(t) time-shift is equivalent to choice of initial conditions. 
Moreover, it is easy to see that for any r G (—00,00) and Xj(0) G (0,1) there is an initial 
condition Xj(0) such that Xj(t + r) = Xj(t). 

All we have to prove now is that Xi{t) is a sigmoidal function. Let us consider Xj. As it follows 
from system equations, Xi(t) time-derivative is: 

dxiit) 

= aiXi{t){l - Xi{t)), 
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then 

Xi{t) = J c^^Ut)i^ - x^{t))dt = f{ait + h) + D, (37) 

where 

/("^^ + = TT^q^' ^ = 0- 
As initial conditions of system ()33|) completely define time-shifts of the solutions Xi{t), coefficients 
bi in ()37|) depend on initial conditions Xi{0) only. 

We just proved that i-th solution of system ()33|1 can be written in the following manner: 

Xi{t) = f{ait + hi), 

where hi G (—00,00), hi = f^^{xi{0)) depends on Xj(0) G (0, 1) explicitly and /(■) is the sigmoid 
function. Let us consider output ?/(x) of system (jH!^ : 

We denote q = Ci/Pi, so ?/(x) can be written in the form: 

n 
i=l 

Therefore, due to P, for any e > and g{t) G C^q^] there are such n, Ci and hi that the following 
inequality holds: 

n 

\Y,{cJ{ait + hi))-g{t)\<e 
1=1 

for t G [0,T]. To conclude the proof, it is sufficient to notice that parameters a,, and initial 
conditions Aj can be restored from hi and q. The theorem is proven. 

Lemma U\ proof. The lemma proof is trivial. Trajectories x(it) and x(t) of (jH)) are bounded, 
then sum 

n 

E («^(ei,(x) - 6,(x)) + A(6,(x) - 6,(x))) (1 - A(t, D))| < D2, 

i=l 

where 1^2 > 0. Therefore the coefficients k* (if exist) should satisfy the following inequality 

— <^<-T.k:c^ 

^ " 1=1 

for e > S > 0. Vector C 7^ 0, hence there exists at least one q 7^ 0. Therefore there exists at least 
one vector k* = {kl, . . . , k^)^ such that 
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Therefore inequality (llUj) is satisfied for every e > 6 > 0. The lemma is proven. 

Theorem proof. According to the theorem assumptions vector C 7^ 0. Therefore, from 
Lemma H it follows that there exist coefficients k* such that 



E («^(ei,(x) - 6,(x)) + A(6,(x) - 6,(x))) (1 - A(t, D))\ + eJ2 k*c^ < (3^ 

i=l i=l 

for any e > 6 — 61, where 6 > 61 > 0. Define the following set of time intervals: 

At,o = {A(2z,2^ + 1) = [t2i,t2i+i]\Xit,D) = Vt G [ta^.W], 
i e Af, to < ti . . . < tj < tj+i < <...}. 

At^l = {n{2^ + l,2^ + 2) = {t2^+l,t2i+2)\Ht,D) = l\/te{t2i+ut2i+2), 
ti <. t2 ■ ■ ■ <. tj <. tj^i < tj^2 <^ . . .}. 

Consider the following positive-definite function 

V{e, a, p, K) = Ss{y)ydu + 0.5||d - a||2„, + 0.5||/3 - (3\\t^-, + 0.5||X(t) - k 
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where k* satisfy inequality for every e > 5 — 5i. Its time-derivative over the set Atfi can be 
expressed as follows 

-V{e, a, K) = S^ie) f ee - ^ ((d, - «,)eC%,(x) - (A - A)eC%,(x) - (A;,(t) - kl)e^cM 
It is clear that = for any |e| < 5 as ^'^(e) = for all |e| < 5. Let |e| > 5, then 

= Ss{e)e (^C^ (a.^i^lx) - a,6,(x) + A6,(x) - A6,(x)) + C^ir(t)e^ - 
^5(e)e - a,)C%,(x) - (A - A)C%,i(x) - (A;,(t) - A;*)eQ^ 

= Ss{e)e C^a.(6,(x) - iiA^)) + C^A(6,(x) - 6,(x)) + Kc,^ 

< Ss{e)\e\ (^\ (^X:C^«.(eM(x)-6,i(x)) + C^Afe,i(x)-6,(x))) l + E^^^^^ 

n n 

< Ssie)\e\Y,k*c,{\e\-5 + 5,) < Ss{e)\e\J2k*cA < 0. (39) 

i=l i=l 

(In order to get the last inequality note that sum J27=i K'^i Kiust be negative.) Taking into account 
that V is not positive over [t2i,^2i+i] and that e{ti) = (because the states of both reference and 
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tracking systems are forced to move to x(0) over i), one can write 

I^(e(t2^),«(t2i),/3(t2*),i^(t2^)) - V^(e(t2^+i),a(t2m),/3(W),^ 
= 0.5||a(t2^) - + O.50ihi) - PW^-i + 0.5\\K{t2^) - k*f^-r - 

0.5\\a{t2i+i) - Qi||^-i - 0.5\\P{t2i+i) - - 0.5\\K{t2i+i) - k*\f^-^ 

/■*2i+i _ JL^ /■e(t2i+i) 



/■t2i+l re\t2i+lj 

I Ss{e)\e{T)2_^k*Cj\5idT+ S5{v)vdv. 

Jt2i j=l ^0 

Consider the following series: 

n 

W{n) = 0.5^(||dfe)-«||?-i + i|/3(t2i)-/5i|?-i + i|i^fe)-fc1l?-i- 

i=0 

\\a{t2i^,) - - ||^(t2m) - - ||i^(^2i+i) - . 

One can notice that 

||a(t2^+i) - + \m2^+l) - + ||i^(t2m) - A;* 11^-1 

= ||«(t2^+2) - + ||/3fe+2) - + \\K{t2^+2) ' ^ H^.i 

as vectors a, /9 and X remain constant over intervals A^^i. Therefore 

W{n) = 0.5(||d(to)-«||?-i + ||/3(to)-/3||?-i + ||i^(to)-r||2_,- 
||«(W) - - l|/3(Wi) - /Sll'-i - ll^(Wi) - 
> E/ 'S'5(e)|e(r)^A;*c,|(5i(ir + ^ / > 0. (40) 



Given that x(t), x(t) are bounded we can conclude that a, jS and K{t) are bounded and hence d, 
/9, are bounded. Furthermore, the following inequality holds 

0.5 (||d(to) - + Wo) - + ||i^(io) - > E / Ss{e)\e{T) k*c,\S,dr 

i=0 -'^'^i j=l 

= Z*'"^' -S5(e)A(r, L>)|e(r) fcjc.l^idr > 0. 
Hence 

0< / Ss{e)X{T, D)\e{T)J2k*Cj\SidT < OO. 

Let us consider the following time-intervals Aj — [T2i,T2i+i] '■ \e\X{t,D) > (5 Vt € Aj, i e 
{0, 1, ... , oo}. As \e{t)\ > 5 it is clear that 

POO ^ PCO ^ 

oo > / Ss(e)\(T,D)\e(T)yk*cMidT> / 55(e)A(r, Z})|5 V A;*c,-|(5idT 
Jo Jo 

oo n 

= E^^I'^X^*9l<5i>0. 
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Then series 



i=0 j=l 



converges and, therefore, Aj — ^ as i ^ oo. In order to finish the proof of the theorem, it 
is sufficient to consider the error function e(t) over intervals Aj. Derivative e is bounded (say 
|e| < D^) as vectors x, x, a, /3, K{t) are bounded. Therefore for any t e Af. 

\e{t)X{t,D)\ = \e{T2i) + e{T)dT\ < |e(r2,)| + | e{T)dT\ 

JT2i J-Pii 

< |e(r2i)| + |Ai|D3 = (5 + AiD3. 

Then 

hm sup|e(t)A(t,L')| = 5. 

t—*oo 

Hence for any arbitrary small 5i > there exists such ti that 



\e{t)\{t,D)\ < + 



for any t > ti. The theorem is proven. 
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Figure 2: Trajectories a{t),c{t) in system ()27|) with algorithm ()28|) (Curve 1) and algorithm ()29p 
(Curve 2) starting from point (—3, —3). Global minimum is marked by circle 
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Figure 3: Trajectories a{t),c{t) in system ()27|1 with algorithm ()28|) (Curve 1) and algorithm ()29j] 
(Curve 2) starting from point (—3, —3). The trajectories have been shown for time interval [0, 900] 
sec. 
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Figure 4: Trajectories a{t),c{t) in system (P7j) with algorithm (PHj) (Curve 1) and algorithm 
()29|) (Curve 2) starting from point (3,-3). Algorithm ()28p ensures that the estimates reach a 
neighborhood of the global minimum in very short time and then to approach it with oscillations 
in the parameter space (blob-like part of the trajectory). 
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Figure 5: Trajectories a{t), c{t) in system (j^7j) with batch gradient algorithm ()30|) starting from 
point (—3,-3) (Curve 1) and (3,-3) (Curve 2). None reaches the global minimum (marked by 
circle) 
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Figure 6: Histograms of the distributions of the distances d{{T + AT2) 10000) (plot a) and d{0) 
(plot b) for 400 trials with random initial conditions for the estimates di(0). 
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Figure 7: Histograms of the distributions of the performance indices R{{T + AT2)10000) (plot a) 
and -R(O) (plot b) for 400 trials with random initial conditions for the estimates di(0). 
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